The end of Dennard scaling has led designers to embrace architectural
and microarchitectural heterogeneity to improve the energy-efficiency
of computation. The same CMOS scaling trends have also lead to
significant research on alternatives to CMOS itself. Importantly, none
of the candidate technologies appear likely to completely supplant
CMOS, although they offer different tradeoffs between serial
performance and energy efficiency. With 3D stacking, process
heterogeneity is now achievable, further expanding the space of
heterogeneous processing to include modules utilizing these post-CMOS
devices alongside CMOS components.

To achieve superior performance with slower, but more efficient,
devices requires exploiting parallelism.  In this paper, we focus on
the Heterojunction Tunnel FET (TFET) as as a CMOS alternative, and
consider the impacts of device heterogeneity across three forms of
parallelism: instruction-level(ILP), thread-level(TLP), and
data-level(DLP). We show that workloads exist that favor both CMOS and
TFET designs at both the ILP and TLP levels, warranting heterogeneous
solutions featuring both CMOS and TFET cores, and describe a
heterogeneous system that can exploit application preferences. We also
examine DLP in the form of accelerator architectures designed with
both CMOS and TFET libraries, and show that the highly regular and
exploitable parallelism heavily favors TFET solutions.
% LocalWords:  Dennard microarchitectural CMOS tradeoffs 3D Heterojunction FET
% LocalWords:  TFET ILP TLP DLP
